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Abstract 


Potholes present a substantial hazard to road safety, resulting in accidents and impeding the smooth flow of 
traffic. This issue is particularly salient in developing nations such as Nigeria, where proactive and effective 
pothole management is imperative. The present study addresses this challenge by advocating a pioneering 
methodology employing an Enhanced Faster R-CNN algorithm that amalgamates EfficientNet and Faster R-CNN 
techniques. The primary objective of this model is to enhance pothole detection accuracy, with a specific focus on 
facilitating the operations of autonomous vehicles within environments characterized by resource limitations. By 
harnessing the efficiency of the Lightweight Faster R-CNN in conjunction with EfficientNet, the proposed model 
attains a notable accuracy rate of 97.7%. This performance surpasses that of established architectures including 
MobileNetV2, ResNet50, VGG16, and Inception V3. These findings underscore the efficacy of the model in real- 
time pothole detection, thereby underscoring its potential to substantially ameliorate road safety and traffic 
management in developing regions. 
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1. Introduction 


Recent advancements in artificial intelligence (AI) have catalyzed the emergence of Autonomous Vehicle 
Systems (AVS) within the automotive industry. However, ensuring human safety under real-world driving 
conditions remains a formidable challenge for these industries. The World Health Organization has documented 
elevated rates of road fatalities, predominantly attributable to negligent driving behaviors and insufficient 
comprehension of the driving environment. In response, scholars have delved into the application of deep learning 
and computer vision methodologies for comprehending the driving environment, encompassing tasks such as 
feature extraction, classification, detection, and tracking. 


Autonomous vehicles, colloquially termed self-driving cars, are undergoing intensive investigation and 
development by numerous technological enterprises and academic institutions. The overarching objective is to 
establish safe and efficient navigation through the amalgamation of AI and computer vision techniques. However, 
the hurdles associated with object detection and recognition persist, particularly amidst fluctuating lighting and 
weather conditions. The utilization of deep learning and computer vision for understanding the driving 
environment stands as a viable strategy to surmount these challenges. Notably, the detection of road anomalies, 
including potholes, assumes paramount importance for AVS. Deep learning, in conjunction with computer vision, 
harbors the potential to furnish cost-effective and resilient solutions for autonomous driving endeavors. A case in 
point is the deployment of a deep learning-driven solution for pothole detection in self-driving vehicles, which 
surpasses conventional image processing techniques. 
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Detecting road irregularities, such as potholes, constitutes a pivotal facet of AVS functionality. Various 
methodologies have been explored for pothole detection, encompassing thermal and optical camera systems. 
Nonetheless, existing approaches exhibit constrained performance and often entail high costs or intricate setups. 
A parsimonious, vision-based deep learning model has been proposed to enhance pothole detection efficacy for 
AVS, thereby showcasing its adaptive intelligence in evading potholes to avert potential accidents. However, the 
decision-making capabilities of AVS predicated on detected potholes warrant further investigation. Alhaji et al. 
(2022) and Oyekanmi and Ejem (2022) have underscored the pervasiveness of extensive potholes and road 
depressions in specific regions, exacerbating traffic congestion. 


The fusion of deep learning and computer vision holds promise in furnishing economical and resilient solutions 
for the autonomous driving sector. Leveraging Convolutional Neural Networks (CNNs) has substantially 
enhanced image classification accuracy vis-a-vis erstwhile techniques reliant on manually engineered feature 
extractors and classifiers. By training high-capacity models with sparse annotated detection data, CNNs afford 
superior object detection performance relative to systems predicated on rudimentary features. Notably, in the 
domain of pothole detection for self-driving vehicles, a deep learning-driven approach has been unveiled, 
delivering robust pothole detection capabilities that surpass conventional image processing methods. Embracing 
lightweight deep learning techniques, such as Lightweight Faster R-CNN, yields expedited processing compared 
to alternative methodologies (Gayathri & Thangavelu, 2021). 


The utilization of autonomous vehicles for pothole detection remains nascent. While autonomous vehicle 
technology has witnessed significant strides, infrastructure, road conditions, and regulatory frameworks 
necessitate refinement to actualize the widespread deployment of autonomous vehicles. Realizing the integration 
of autonomous vehicles will entail investments in infrastructure upgrades, technological adaptation, and the 
formulation of lucid regulatory frameworks for autonomous driving. While widespread adoption of autonomous 
vehicles may entail a gestation period, the attendant benefits, encompassing heightened safety, efficiency, and 
mitigated human error in driving, are poised to galvanize the embrace of autonomous vehicle technology within 
the country. 


2. Related Works 


In recent years, Computer Vision-based Approaches for road pothole detection involve using vehicle-mounted 
cameras and applying image processing techniques such as edge detection, blob analysis, and deep learning 
algorithms. These approaches offer flexibility and robustness in varied road and lighting conditions. Classical 2-D 
image processing, a traditional technique, is widely used in pothole detection, but it faces limitations in handling 
real-world complexities, sensitivity to image quality, and requires additional techniques for improved 
performance. Ma et al. (2022) applied a computer vision for road imaging and pothole detection: a state-of-the-art 
review of systems and algorithms. 2-D image processing, the limitations include sensitivity to image quality and 
difficulty in handling occlusions and viewpoint changes. Additionally, these methods are often limited in their 
ability to handle variability in image conditions such as illumination changes, camera distortions, and background 
clutter. Classical 2-D image processing for pothole detection has limitations in dealing with real-world 
complexities, sensitivity to image quality, need for manual feature extraction, limited generalization, and 
requirement of additional techniques to improve performance. 


Sensor-based Approaches for road pothole detection utilize smart sensors and actuators in vehicles and 
infrastructure, gathering data to enhance automated driving. These approaches rely on on-board sensors like 
LiDAR, radar, cameras, and external sensors to perceive surroundings. Sensors such as LIDAR and radar are 
employed to detect and identify potholes, providing accurate results. These sensors are expensive and require 
advanced sensor setups compared to computer vision-based methods. 
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Figure 1: Camera and Laser sensor for pothole detection on an autonomous car (Source: Vupparaboina et al., 
2015) 


Bosi et al. (2019) introduced the concept of Virtual Sensor for Pothole Detection within Vehicle-to-Everything 
(V2X) systems. Employed in autonomous vehicles, this Virtual Sensor aids in discerning road conditions, thus 
enhancing navigation and potentially facilitating road maintenance notifications. 


Raja et al. (2022) exhibited the development of an intelligent strategy for pothole avoidance in autonomous 
vehicles. Their model demonstrated notable efficacy in navigating potholes, indicating its potential to enhance 
safety and reliability in autonomous vehicle operations. 


Despite its promise, the Virtual Sensor Concept for Pothole Detection encounters several challenges, including 
the imperative for precise sensor data, real-time processing capabilities, dependable communication, and 
limitations in certain environmental contexts. Issues such as suboptimal sensor data quality, computational 
resource constraints, disrupted communication channels, and adverse weather conditions may compromise 
accuracy, necessitating supplementary solutions to address these hurdles. 


Deep Convolutional Neural Networks (DCNNs) have emerged as a popular paradigm for road pothole detection, 
propelled by recent strides in machine/deep learning. Three primary types of DCNNs are prevalent in this domain: 
image classification networks, object detection networks, and semantic segmentation networks. 


Image classification networks are trained to categorize road images as either positive (containing potholes) or 
negative (pothole-free). Object detection networks focus on pinpointing road potholes at an instance level. 
Semantic segmentation networks aim to partition road images into pixel-level or semantic-level representations 
conducive to pothole detection. 


These networks undergo training via back-propagation algorithms, leveraging extensive human-annotated road 
data. Data-driven Road pothole detection strategies have gained traction due to their independence from explicit 
parameters for road image or point cloud segmentation. 


Preceding the proliferation of deep learning technologies, researchers predominantly relied on classical image 
processing algorithms for generating manually engineered visual features, subsequently training deep learning 
models for road image patch classification. Prominently, CNN-based models epitomize this approach. 
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Ahmed (2021) pioneered Smart Pothole Detection utilizing deep learning techniques centered around dilated 
convolution. While this methodology exhibits promise in furnishing accurate and resilient road pothole detection, 
it contends with scalability limitations and dependencies on training data quality and computational resources. 


Kharel and Ahmed (2021) investigated the integration of Inverse Perspective Mapping and CNNs for real-time 
pothole detection and area estimation via image processing. Although CNNs demonstrate adequate pothole 
detection accuracy, challenges such as the requisite for abundant labelled training data, computational demands, 
and variations in pothole appearances hamper their efficacy. Addressing these challenges mandates the adoption 
of supplementary techniques like data augmentation and transfer learning. 


Yik et al. (2021) proposed a real-time pothole detection framework predicated on deep learning methodologies, 
employing the YOLOv3 algorithm. While offering potential for real-time and precise road pothole detection, this 
approach is not devoid of scalability constraints and dependencies on training data and image quality. 


Dutta and Chakraborty (2020) devised a novel Convolutional Neural Network-based model tailored for all-terrain 
autonomous driving scenarios. While the Convolutional Neural Network Model demonstrates promise in enabling 
autonomous driving across diverse terrains, it grapples with challenges related to robustness, dependencies on 
training data, generalizability, and computational complexity. 


Feng et al. (2022) leveraged a Segmentation of Road Potholes with Multimodal Attention Fusion Network for 
Autonomous Vehicles. While this methodology promises accurate and robust road pothole detection, it is 
encumbered by scalability limitations, dependencies on training data quality, and computational costs. 


Gupta and Dixit (2022) explored a hybrid approach integrating FactorNet into Faster R-CNN for pothole 
detection in autonomous vehicles. Despite its potential for accurate and efficient pothole detection, this hybrid 
approach is beset by complexity, heightened computational demands, and dependencies on training data. 


Kortmann et al. (2022) devised a comprehensive pothole detection system tailored for end-to-end autonomous 
driving scenarios, integrating low-cost pre-installed sensors, cloud- and crowd-based HD Feature Maps, and 
lightweight deep learning techniques. While promising real-time and accurate pothole detection, this end-to-end 
system contends with scalability issues, dependencies on sensor quality, and reliance on crowd-sourced data. 


Ma et al. (2022) explored a hybrid approach amalgamating classical 2-D image processing, 3-D point cloud 
modelling, and Convolutional Neural Networks (CNNs) for pothole detection. This hybridization offers 
advantages such as enhanced accuracy and robustness but introduces complexities, heightened computational 
demands, and elevated data requirements. Although promising, the hybrid approach necessitates considerable 
resources and expertise. 


Manalo et al. (2022) proposed a Transfer Learning-Based System for Pothole Detection employing Deep 
Convolutional Neural Networks. While exhibiting strengths in terms of improved accuracy, efficient training, and 
robustness, this system is constrained by dependencies on pre-trained models, limited adaptability, and 
complexity. 


The limitations of deep learning in autonomous vehicle pothole detection encompass challenges such as data 
quality dependencies, overfitting, computational complexities, privacy concerns, unstructured data challenges, 
and limited model generalization. Chen et al. (2022) advocate for an effective method utilizing EfficientNet B4 
for thermal image analysis, offering reduced computational complexities and heightened accuracy. Renowned for 
its efficiency, the EfficientNet architecture holds promise for deployment in resource-constrained platforms like 
autonomous vehicles. Employing it as the backbone network for a Lightweight Faster R-CNN network in pothole 
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recognition endeavours promises heightened accuracy and efficiency, facilitating lightweight and efficient neural 
networks conducive to autonomous vehicle applications (Tang et al.,2021; Jebamikyous, & Kashef ,2022). 
Nevertheless, prevailing studies often suffer from inadequate model generalization and potential biases, 
overlooking road conditions prevalent in developing nations. To address these disparities, this study endeavours 
to curate a dataset reflective of Nigerian road conditions, utilizing it to train a Potholes Recognition model 
incorporating Lightweight Faster R-CNN and EfficientNet for robust navigational capabilities. 


Wang and Li (2022) affirm the efficacy of the EfficientNet architecture, renowned for its high accuracy and 
efficiency, rendering it a favoured choice for deployment in resource-constrained platforms such as autonomous 
vehicles. The lightweight nature and reduced computational complexities of the EfficientNet architecture render it 
apt for integration into systems with limited processing capabilities, such as automotive platforms. Leveraging 
this architecture as the backbone network for a Lightweight Faster R-CNN network holds promise for achieving 
heightened accuracy and efficiency in real-time pothole detection for autonomous vehicles. 


2. Methodology 
This paper explores the utilization of a Lightweight EfficientNet Faster R-CNN, a variant of object detection 
networks, for the purpose of pothole recognition within autonomous vehicle systems. Serving as a refinement of 


the Faster R-CNN network, this model is designed to exhibit enhanced computational efficiency without 
compromising accuracy. The architectural framework of the proposed model is presented in Figure 2. 
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Figure 2: Proposed Light Weight Faster R-CNN model (Source: Ahmed, 2021). 


The network architecture of the Light-Weight Faster R-CNN comprises two primary components: a backbone 
network and two task-specific branches. The backbone network, rooted in a convolutional neural network (CNN), 
undertakes the task of feature extraction from the input image. Among the commonly adopted backbone networks 
in the Light-Weight Faster R-CNN paradigm, the EfficientNet backbone network is prominent for generating a 
suite of feature maps, subsequently channelled into the two task-specific branches. 
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The first branch, known as the Region Proposal Network (RPN), is tasked with generating proposals or candidate 
regions harbouring objects of interest. Leveraging anchor boxes, the RPN generates these proposals and evaluates 
their quality through a set of binary classification scores. Proposals with the highest scores progress to the second 
branch for further scrutiny. 


The second branch, termed the detection branch, is responsible for object classification within the proposals 
identified by the RPN. Employing a series of fully connected layers, the detection branch categorizes the objects 
and formulates the ultimate bounding boxes delineating their spatial positions within the image. 


Training the Light-Weight Faster R-CNN network entails utilizing a sizable dataset of annotated images, 
comprising both positive samples (depicting potholes) and negative samples (devoid of potholes). The training 
regimen integrates supervised learning and reinforcement learning methodologies, aimed at minimizing the loss 
function and refining the model's accuracy. 


Upon completion of training, the network transitions to deployment, where it engages in real-world scenarios to 
execute pothole recognition tasks. During inference, the network ingests an input image and generates a repertoire 
of proposals and bounding boxes pinpointing the pothole locations within the image. Subsequently, the network's 
output serves as navigational guidance for autonomous vehicles, facilitating safe navigation around potholes and 
averting potential vehicular damage. 


In summary, the Light-Weight Faster R-CNN network emerges as a robust and efficient solution for pothole 
recognition in autonomous vehicle settings, amalgamating the merits of CNNs and object detection algorithms to 
yield precise and computationally efficient outcomes. The integration of the EfficientNet backbone network 
enhances performance by balancing model complexity and accuracy, thereby fostering enhanced efficiency and 
reduced computational overheads—attributes crucial for cultivating lightweight and efficient neural networks, 
particularly in domains like autonomous vehicle technology (Tang et al.,2021; Jebamikyous, & Kashef ,2022). 


The collection of pothole images involves employing a vehicle equipped with cameras to capture various potholes 
across Kaduna metropolis. Augmentation of these images is accomplished by supplementing them with 665 
pothole images sourced from COCO datasets. A total of 800 images are amassed and subsequently utilized for 
training purposes. 


acs True Positive 
Precision = —————_——————— .... 
True Positive+False Positive 


aca True Positive 7 
ecal = = —_—_—_——_—————_ 
True Positive + False Negative 


True Positive + True Negative 
Accuracy = Ss _——_—____—_ __, 3 
True Positive + True Negative + False Positive + False Negative 


Precision .Recal 
F1=2 


` TPrecision+Recal ` 


3. Result and Discussions 

This study trained the Faster R-CNN models with VGG16, Inception V3, MobileNetV2, and ResNet50 
architectures, alongside the Improved Faster R-CNN utilizing the EfficientNet network. The objective was to 
evaluate and compare their performance metrics against those proposed in the study. The results of which are 
shown in Table 1. 
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Table 1: The Faster R-CNN models parameters 


Anchor Box Sizes: 32, 64, 128] 


[ 2 td 
Pooling Size 7x7 
Hidden Layer Unit 
Number of Output Classes: 
16 
50 


0.001 


[Bach Size: e o 
Number of Epochs: so 


RPN Loss: Binary Cross-Entropy 
Classification Loss: Categorical Cross-Entropy 
Regression Loss: Smooth L1 Loss 


Table 1 outlines the parameters of Faster R-CNN models utilized for pothole detection. The input image size is set 
at 224x224 pixels, and the models comprise 16 convolutional layers with ReLU activation function. The anchor 
box sizes used for object localization are [32, 64, 128], while the pooling size is 7x7. The hidden layer unit 
consists of 256 neurons. These models are designed to classify images into two classes: pothole and non-pothole. 
The learning rate is set at 0.001, with a batch size of 16 and training conducted over 50 epochs. Loss functions 
employed include binary cross-entropy for the Region Proposal Network (RPN), categorical cross-entropy for 
classification, and smooth L1 loss for regression. These parameters are crucial for optimizing the performance of 
the Faster R-CNN models in accurately detecting potholes on roads. The performance of the Improved Faster R- 
CNN with EfficientNet model is shown in Table 2. The table presents the results and evaluation metrics of the 
Improved Faster R-CNN with EfficientNet model. 


Table 2: Performance results of Faster R-CNN with EfficientNet model 


Improved 

Faster R-CNN 

with 
Description EfficientNet 


Precision 


90.56 
91.32 


Table 2 presented the performance metrics of the Improved Faster R-CNN model utilizing the EfficientNet 
backbone for pothole detection. This metric indicated the overall correctness of the model's predictions. An 
accuracy of 97.7% suggested that the model correctly identified potholes and non-pothole regions with high 
reliability. A precision of 92.1% indicated that when the model predicted a region as a pothole, it was correct 
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approximately 92.1% of the time. A recall of 90.56% implied that the model successfully identified around 
90.56% of all potholes present in the dataset. A F1 score value of 91.32% indicated a good balance between 
precision and recall, reflecting the model's ability to effectively identify potholes while minimizing false positives 
and false negatives. These performance results underscored the efficacy of the Improved Faster R-CNN with 
EfficientNet in pothole detection, showcasing high accuracy, precision, recall, and F1 score. Such robust 
performance metrics were essential for ensuring reliable and efficient detection of potholes on roadways, thereby 
contributing to enhanced road safety and maintenance efforts. 


This work presented an enhanced Faster R-CNN architecture incorporating the EfficientNet backbone, 
demonstrating significant proficiency in pothole detection. The proposed model achieved superior accuracy, 
precision, recall, and Fl-score, making it suitable for real-time deployment on roadways prone to pothole hazards, 
particularly in resource-constrained environments. The integration of EfficientNet enabled an efficient feature 
extraction while maintaining high detection performance, paving the way for practical pothole identification and 
mitigation strategies. 


Table 3 shows the Summary of the classification parameters obtained after simulations of the four models. 


Table 3: Model Comparison 


Inception Improved Faster R- 
Description al 16 ee ae etV2 RestN ae — N with EfficientNet 


952 22 9412 | 12 19562 62 foal 71 977 7 


89.41 90.32 90.62 
87.50 83.70 89.60 87.21 90.56 
88.44 84.59 89.96 88.88 91.32 


Table 3 provides a concise overview of the key performance metrics, encompassing accuracy, precision, recall, 
and Fl-score. 


The VGG16 model exhibited satisfactory accuracy, precision, and recall, indicating robust detection capabilities, 
albeit slightly lagging behind the EfficientNet-based model. Inception V3 demonstrated a satisfactory accuracy 
but registered lower precision and recall, yielding a moderate overall performance. MobileNetV2 demonstrated a 
balanced performance across metrics, albeit marginally below the EfficientNet model. Similarly, ResNet50 
maintained balance in performance metrics, albeit with slightly reduced accuracy and precision compared to the 
proposed model. 


The Improved Faster R-CNN architecture with EfficientNet integration emerged as the standout performer, 
achieving the highest accuracy (97.7%), precision (92.1%), recall (90.56%), and Fl-score (91.32%). These 
outstanding metrics established its position as the leading choice for real-time pothole detection applications. 
Despite potential concerns regarding overfitting, the model showcases exceptional performance relative to 
alternative architectures investigated. 


4. Conclusion 

This paper investigated the development of improved pothole detection systems by implementing the Enhanced 
Faster R-CNN with EfficientNet algorithm. Initiated with an extensive review of existing systems, the study 
aimed to address the domain's challenges. The proposed algorithm, leveraging the strengths of both architectures, 
demonstrated significant advancements in accuracy, computational efficiency, and overall effectiveness for real- 
world applications. The research methodology encompassed systematic data collection, model development, and 
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rigorous evaluation. The resulting Pothole Recognition model, integrating EfficientNet within the Faster R-CNN 
framework, underwent comprehensive benchmarking, showcasing impressive accuracy and precision metrics. 


The findings highlight the model's efficacy, achieving an accuracy of 97.7%, precision of 92.1%, recall of 
90.56%, and F1 score of 91.32%. This success offers a valuable contribution to pothole detection technology, 
providing a practical and efficient solution for real-world applications and paving the way for further 
advancements. Future research avenues identified include: exploration of advanced neural network architectures, 
Incorporation of multimodal data sources, real-world deployment and testing; development of dynamic pothole 
classification systems, Implementation of transfer learning techniques and Utilization of user feedback for model 
improvement. These potential directions aim to enhance pothole detection systems globally, ultimately 
contributing to safer and well-maintained road infrastructure. 
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